Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
نویسندگان
چکیده
In this paper we describe a method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training. We use the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI. To make its computation feasible we use a phone n-gram language model, in place of the word language model. To further reduce its space and time complexity we compute the objective function using neural network outputs at one third the standard frame rate. These changes enable us to perform the computation for the forward-backward algorithm on GPUs. Further the reduced output frame-rate also provides a significant speed-up during decoding. We present results on 5 different LVCSR tasks with training data ranging from 100 to 2100 hours. Models trained with LFMMI provide a relative word error rate reduction of ∼11.5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions. A further reduction of ∼2.5%, relative, can be obtained by fine tuning these models with the word-lattice based sMBR objective function.
منابع مشابه
Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline
This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kald...
متن کاملLatticeRnn: Recurrent Neural Networks Over Lattices
We present a new model called LATTICERNN, which generalizes recurrent neural networks (RNNs) to process weighted lattices as input, instead of sequences. A LATTICERNN can encode the complete structure of a lattice into a dense representation, which makes it suitable to a variety of problems, including rescoring, classifying, parsing, or translating lattices using deep neural networks (DNNs). In...
متن کاملSequence-discriminative training of deep neural networks
Sequence-discriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria — maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI — are compared. Two different heuristics are investigated to improve the performance of ...
متن کاملTask-aware deep bottleneck features for spoken language identification
Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recent...
متن کاملA Neural Network Based Regression Approach for Recognizing Simultaneous Speech
This paper presents our approach for automatic speech recognition (ASR) of overlapping speech. Our system consists of two principal components: a speech separation component and a feature estmation component. In the speech separation phase, we first estimated the speaker’s position, and then the speaker location information is used in a GSC-configured beamformer with a minimum mutual informatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016